18 research outputs found
Intrinsically Motivated Goal Exploration Processes with Automatic Curriculum Learning
Intrinsically motivated spontaneous exploration is a key enabler of
autonomous lifelong learning in human children. It enables the discovery and
acquisition of large repertoires of skills through self-generation,
self-selection, self-ordering and self-experimentation of learning goals. We
present an algorithmic approach called Intrinsically Motivated Goal Exploration
Processes (IMGEP) to enable similar properties of autonomous or self-supervised
learning in machines. The IMGEP algorithmic architecture relies on several
principles: 1) self-generation of goals, generalized as fitness functions; 2)
selection of goals based on intrinsic rewards; 3) exploration with incremental
goal-parameterized policy search and exploitation of the gathered data with a
batch learning algorithm; 4) systematic reuse of information acquired when
targeting a goal for improving towards other goals. We present a particularly
efficient form of IMGEP, called Modular Population-Based IMGEP, that uses a
population-based policy and an object-centered modularity in goals and
mutations. We provide several implementations of this architecture and
demonstrate their ability to automatically generate a learning curriculum
within several experimental setups including a real humanoid robot that can
explore multiple spaces of goals with several hundred continuous dimensions.
While no particular target goal is provided to the system, this curriculum
allows the discovery of skills that act as stepping stone for learning more
complex skills, e.g. nested tool use. We show that learning diverse spaces of
goals with intrinsic motivations is more efficient for learning complex skills
than only trying to directly learn these complex skills
Automatic Curriculum Learning For Deep RL: A Short Survey
Automatic Curriculum Learning (ACL) has become a cornerstone of recent
successes in Deep Reinforcement Learning (DRL).These methods shape the learning
trajectories of agents by challenging them with tasks adapted to their
capacities. In recent years, they have been used to improve sample efficiency
and asymptotic performance, to organize exploration, to encourage
generalization or to solve sparse reward problems, among others. The ambition
of this work is dual: 1) to present a compact and accessible introduction to
the Automatic Curriculum Learning literature and 2) to draw a bigger picture of
the current state of the art in ACL to encourage the cross-breeding of existing
concepts and the emergence of new ideas.Comment: Accepted at IJCAI202
Trying AGAIN instead of Trying Longer: Prior Learning for Automatic Curriculum Learning
A major challenge in the Deep RL (DRL) community is to train agents able to
generalize over unseen situations, which is often approached by training them
on a diversity of tasks (or environments). A powerful method to foster
diversity is to procedurally generate tasks by sampling their parameters from a
multi-dimensional distribution, enabling in particular to propose a different
task for each training episode. In practice, to get the high diversity of
training tasks necessary for generalization, one has to use complex procedural
generation systems. With such generators, it is hard to get prior knowledge on
the subset of tasks that are actually learnable at all (many generated tasks
may be unlearnable), what is their relative difficulty and what is the most
efficient task distribution ordering for training. A typical solution in such
cases is to rely on some form of Automated Curriculum Learning (ACL) to adapt
the sampling distribution. One limit of current approaches is their need to
explore the task space to detect progress niches over time, which leads to a
loss of time. Additionally, we hypothesize that the induced noise in the
training data may impair the performances of brittle DRL learners. We address
this problem by proposing a two stage ACL approach where 1) a teacher algorithm
first learns to train a DRL agent with a high-exploration curriculum, and then
2) distills learned priors from the first run to generate an "expert
curriculum" to re-train the same agent from scratch. Besides demonstrating 50%
improvements on average over the current state of the art, the objective of
this work is to give a first example of a new research direction oriented
towards refining ACL techniques over multiple learners, which we call Classroom
Teaching.Comment: Accepted to the ICLR 2020 workshop Beyond tabula rasa in RL (BeTR-RL
SocialAI: Benchmarking Socio-Cognitive Abilities in Deep Reinforcement Learning Agents
Building embodied autonomous agents capable of participating in social
interactions with humans is one of the main challenges in AI. Within the Deep
Reinforcement Learning (DRL) field, this objective motivated multiple works on
embodied language use. However, current approaches focus on language as a
communication tool in very simplified and non-diverse social situations: the
"naturalness" of language is reduced to the concept of high vocabulary size and
variability. In this paper, we argue that aiming towards human-level AI
requires a broader set of key social skills: 1) language use in complex and
variable social contexts; 2) beyond language, complex embodied communication in
multimodal settings within constantly evolving social worlds. We explain how
concepts from cognitive sciences could help AI to draw a roadmap towards
human-like intelligence, with a focus on its social dimensions. As a first
step, we propose to expand current research to a broader set of core social
skills. To do this, we present SocialAI, a benchmark to assess the acquisition
of social skills of DRL agents using multiple grid-world environments featuring
other (scripted) social agents. We then study the limits of a recent SOTA DRL
approach when tested on SocialAI and discuss important next steps towards
proficient social agents. Videos and code are available at
https://sites.google.com/view/socialai.Comment: under review. This paper extends and generalizes work in
arXiv:2104.1320
TeachMyAgent: a Benchmark for Automatic Curriculum Learning in Deep RL
Training autonomous agents able to generalize to multiple tasks is a key
target of Deep Reinforcement Learning (DRL) research. In parallel to improving
DRL algorithms themselves, Automatic Curriculum Learning (ACL) study how
teacher algorithms can train DRL agents more efficiently by adapting task
selection to their evolving abilities. While multiple standard benchmarks exist
to compare DRL agents, there is currently no such thing for ACL algorithms.
Thus, comparing existing approaches is difficult, as too many experimental
parameters differ from paper to paper. In this work, we identify several key
challenges faced by ACL algorithms. Based on these, we present TeachMyAgent
(TA), a benchmark of current ACL algorithms leveraging procedural task
generation. It includes 1) challenge-specific unit-tests using variants of a
procedural Box2D bipedal walker environment, and 2) a new procedural Parkour
environment combining most ACL challenges, making it ideal for global
performance assessment. We then use TeachMyAgent to conduct a comparative study
of representative existing approaches, showcasing the competitiveness of some
ACL algorithms that do not use expert knowledge. We also show that the Parkour
environment remains an open problem. We open-source our environments, all
studied ACL algorithms (collected from open-source code or re-implemented), and
DRL students in a Python package available at
https://github.com/flowersteam/TeachMyAgent
Teacher algorithms for curriculum learning of Deep RL in continuously parameterized environments
International audienceWe consider the problem of how a teacher algorithm can enable an unknown Deep Reinforcement Learning (DRL) student to become good at a skill over a wide range of diverse environments. To do so, we study how a teacher algorithm can learn to generate a learning curriculum, whereby it sequentially samples parameters controlling a stochastic procedural generation of environments. Because it does not initially know the capacities of its student, a key challenge for the teacher is to discover which environments are easy, difficult or unlearnable, and in what order to propose them to maximize the efficiency of learning over the learnable ones. To achieve this, this problem is transformed into a surrogate continuous bandit problem where the teacher samples environments in order to maximize absolute learning progress of its student. We present a new algorithm modeling absolute learning progress with Gaussian mixture models (ALP-GMM). We also adapt existing algorithms and provide a complete study in the context of DRL. Using parameterized variants of the BipedalWalker environment, we study their efficiency to personalize a learning curriculum for different learners (embodiments), their robustness to the ratio of learnable/unlearnable environments, and their scalability to non-linear and high-dimensional parameter spaces. Videos and code are available at https://github.com/flowersteam/teachDeepRL
Large Language Models as Superpositions of Cultural Perspectives
Large Language Models (LLMs) are often misleadingly recognized as having a
personality or a set of values. We argue that an LLM can be seen as a
superposition of perspectives with different values and personality traits.
LLMs exhibit context-dependent values and personality traits that change based
on the induced perspective (as opposed to humans, who tend to have more
coherent values and personality traits across contexts). We introduce the
concept of perspective controllability, which refers to a model's affordance to
adopt various perspectives with differing values and personality traits. In our
experiments, we use questionnaires from psychology (PVQ, VSM, IPIP) to study
how exhibited values and personality traits change based on different
perspectives. Through qualitative experiments, we show that LLMs express
different values when those are (implicitly or explicitly) implied in the
prompt, and that LLMs express different values even when those are not
obviously implied (demonstrating their context-dependent nature). We then
conduct quantitative experiments to study the controllability of different
models (GPT-4, GPT-3.5, OpenAssistant, StableVicuna, StableLM), the
effectiveness of various methods for inducing perspectives, and the smoothness
of the models' drivability. We conclude by examining the broader implications
of our work and outline a variety of associated scientific questions. The
project website is available at
https://sites.google.com/view/llm-superpositions .Comment: Preprin
Language Grounding through Social Interactions and Curiosity-Driven Multi-Goal Learning
International audienceAutonomous reinforcement learning agents, like children, do not have access to predefined goals and reward functions. They must discover potential goals, learn their own reward functions and engage in their own learning trajectory. Children, however, benefit from exposure to language, helping to organize and mediate their thought. We propose LE2 (Language Enhanced Exploration), a learning algorithm leveraging intrinsic motivations and natural language (NL) interactions with a descriptive social partner (SP). Using NL descriptions from the SP, it can learn an NL-conditioned reward function to formulate goals for intrinsically motivated goal exploration and learn a goal-conditioned policy. By exploring, collecting descriptions from the SP and jointly learning the reward function and the policy, the agent grounds NL descriptions into real behavioral goals. From simple goals discovered early to more complex goals discovered by experimenting on simpler ones, our agent autonomously builds its own behavioral repertoire. This naturally occurring curriculum is supplemented by an active learning curriculum resulting from the agent's intrinsic motivations. Experiments are presented with a simulated robotic arm that interacts with several objects including tools
Meta Automatic Curriculum Learning
A major challenge in the Deep RL (DRL) community is to train agents able to generalize their control policy over situations never seen in training. Training on diverse tasks has been identified as a key ingredient for good generalization,which pushed researchers towards using rich procedural task generation systems controlled through complex continuous parameter spaces. In such complex taskspaces, it is essential to rely on some form of Automatic Curriculum Learning(ACL) to adapt the task sampling distribution to a given learning agent, instead of randomly sampling tasks, as many could end up being either trivial or unfeasible.Since it is hard to get prior knowledge on such task spaces, many ACL algorithms explore the task space to detect progress niches over time, a costly tabula-rasa process that needs to be performed for each new learning agents, although they might have similarities in their capabilities profiles. To address this limitation, we introduce the concept of Meta-ACL, and formalize it in the context of black-box RL learners, i.e. algorithms seeking to generalize curriculum generation toan (unknown) distribution of learners. In this work, we present AGAIN, a first in-stantiation of Meta-ACL, and showcase its benefits for curriculum generation overclassical ACL in multiple simulated environments including procedurally generated parkour environments with learners of varying morphologies